Approximately Optimal Auctions for Selling Privacy 
when Costs are Correlated with Data* 



Lisa Fleischer and Yu-Han Lyu 

Department of Computer Science 
Dartmouth 



March 2, 2013 



Abstract 

We consider a scenario in which a database stores sensitive data of users and an analyst wants to 
estimate statistics of the data. The users may suffer a cost when their data are used in which case 
they should be compensated. The analyst wishes to get an accurate estimate, while the users want to 
maximize their utility. We want to design a mechanism that can estimate statistics accurately without 
compromising users' privacy. 

Since users' costs and sensitive data may be correlated, it is important to protect the privacy of both 
data and cost. We model this correlation by assuming that a user's unknown sensitive data determines a 
distribution from a set of publicly known distributions and a user's cost is drawn from that distribution. 
We propose a stronger model of privacy preserving mechanism where users are compensated whenever 
they reveal information about their data to the mechanism. In this model, we design a Bayesian incentive 
compatible and privacy preserving mechanism that guarantees accuracy and protects the privacy of both 
cost and data. 



*E-mail: {lkf,yuhanlyu}@cs.dartmouth.edu. Partially supported by NSF grants CCF-0728869 and CCF-1016778. 



1 Introduction 



Using the Internet, it is fairly easy to collect sensitive personal data. Online service providers implicitly 
compensate users who provide their personal data, by offering improved services based on their data. How- 
ever, this implicit exchange may not be fair to the individual, since different people may have different costs 
— a loss in expected utility over future events — for use of their data. Moreover, companies rarely give 
well-defined guarantees concerning data privacy and compensation. When the compensation is less than the 
individual's perceived cost, the individual may choose not to participate. Here, we explore mechanisms to 
fairly compensate individuals for use of their personal data. 

In order to motivate users to participate in a mechanism, the payment to a user should be at least the cost 
to the user. Thus, the mechanism should learn information about users' costs. Ghosh and Roth [8 1 initiate a 
study of this problem. Their mechanism asks users to report their costs for the use of their data to estimate 
statistics, and then selects some of the users (based on their stated costs) to determine the statistics, and pays 
these users accordingly. This mechanism is problematic when costs and personal data are correlated, since 
users may be reluctant to reveal their costs if they are not guaranteed adequate compensation up front. For 
example, suppose that a database indicates whether a vehicle has been damaged. When the database can 
be publicly accessed, the owner of a damaged car cannot sell the car for the same price as the price of an 
undamaged car. Thus, his cost for revealing data is higher than the owner of an undamaged car. Revealing 
information about the costs may also reveal information about whether the car is damaged. Thus, it is 
important to also guarantee privacy of individual payments. 

We study this problem where costs are correlated with data. We model this correlation by assuming that 
a user's unknown data determines a distribution from a set of accurate and publicly known distributions and 
the user's cost is drawn from that distribution. We propose a model of a privacy preserving mechanism where 
users are compensated whenever they reveal any information about their data to the mechanism, whether 
directly, or indirectly by revealing their costs. In this model, we design a Bayesian incentive compatible 
and individually rational mechanism, which produces accurate statistics and protects the privacy of data and 
costs. 

Problem Setting. There are n users, which we call players, denoted by [n]. Each player has sensitive data 
Di G [h], stored in a database D G [/i]". Initially Di is the private information of player i. However, since 
Di is also in the database, it's value may be verified with player i's permission. In addition, player i has a 
value for his loss of privacy of his data. This value f , is private to player i, but it is correlated with Di. This 
correlation is modeled as follows: If = t G [h] then Vi ^ Ft, where Ft is a distribution of privacy costs 
for players of type t that is known to all players and the mechanism. Ft correctly represents the distributions 
of costs of type t players. 

A query is a function Q : [/i]" — )• M, mapping a database to a response. An example of a query is "what 
is the number of people i in the database D with Di = j?". A data analyst wants Q{D). Since the data are 
sensitive, the data analyst accesses the database through a privacy preserving algorithm A. Therefore, the 
data analyst does not receive Q{D) but receives an estimate A{D). To ensure the estimate is accurate, the 
error \Q{D) — A{D)\ should be small with high probability. 

Differential privacy, introduced in [5 1, is an accepted way to measure privacy and privacy preserving 
algorithms. Two databases D and D' are adjacent if they differ in only one entry. An algorithm A satisfies 
e-dijferential privacy, where e > 0, if for any pair of adjacent database D and D' and any set / CM, 
Pr[yl(Z)) G /] < Pr[A(L'') G /]. When e = 0, it implies that the algorithm does not depend on D. If the 
error \Q{D) — A{D)\ is small with high probability, then the algorithm should have large e. Thus, privacy 
guarantees come at the expense of the accuracy. 

Although an e-differentially private algorithm can protect sensitive data, if a player allows his data to be 
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used, he may incur a cost. We model this cost as linear in the privacy loss e and his expected cost Vi\j Thus, 
for player i to agree to the use of his data, his expected payment should be at least evi. 

A mechanism specifies a set of actions that players can take. The players take actions based on their 
data and private costs. Thus, the input of the mechanism is a database and a vector of actions. The outputs 
are an estimate s and a payment vector p = {pi, . . . ,Pn)- Since player i has a linear cost evi, the utility of 
player i is pi — evi if Di is used in the mechanism, otherwise the utility is pi. We assume that all players are 
rational and want to maximize their utilities. A mechanism is a direct mechanism if the action set equals the 
set of all real numbers. That is, a direct mechanism asks players to report their costs. A direct mechanism 
is truthful if every player reports his true cost in order to maximize his utility. Truth telling is a concept 
defined for direct mechanisms. In this paper, we propose an indirect mechanism. Thus, we want to extend 
the notion of truthfulness to indirect mechanisms. In our mechanism, there is a straightforward mapping, 
described in Section |3l from player's type set to player's action set. We say that a player decides truthfully 
if he picks the strategy corresponding to his type under this mapping. 

In our paper, we will assume that the query /goal of the analyst is to estimate rij = \{i : Di = 
Without loss of generality, we assume throughout the paper that the data analyst wants to estimate ni. We 
seek to design a mechanism with the following properties. 

1. Accuracy: A mechanism M is k-accurate, if for any database D, Pr[|s — ni| > /c] < ^, when every 
player decides truthfully. Note that the accuracy guarantee is independent of the size of the database 
— the number k is fixed no matter how large the database is, or the sampled set is. 

2. Differential Privacy: The estimate and payments satisfy e-differential privacy. 

3. Truthfulness: A mechanism is dominant strategy truthful if, for every player, deciding ti^uthfully 
maximizes his utility. A mechanism is Bayesian incentive compatible (BIC) if, for every player, 
assuming that other players' costs are drawn from F according to their data and decide truthfully, 
deciding truthfully maximizes his utility. 

4. Individual Rationality: If a player's utility is non-negative, then he should be willing to participate. 
A mechanism is ex-post individually rational (EPIR) if the utility is non-negative for every player 
when he decides truthfully. A mechanism is ex-interim individually rational (EIIR) if the expected 
utility is non-negative for every player when he decides truthfully, where the randomness comes from 
the mechanism and the costs of other players. 

5. Payment Minimization: The summation of payments should be as little as possible. 

To get permission to use a player's data, the mechanism must compensate the player by at least his per- 
ceived loss of privacy. But since costs are correlated with data, players may be reluctant to reveal their true 
costs, unless they will be compensated for this. To avoid this seeming chicken-and-egg problem, the mech- 
anism designer cannot resort to the revelation principle, which states that any mechanism can be realized 
as a direct and truthful mechanism. In fact, JH prove that if costs and data can be arbitrarily correlated and 
player's cost of privacy can be unbounded, then for any k < n/2, no /c-accurate, direct, dominant strategy 
truthful, EPIR, privacy preserving mechanism exists. On the other hand, we give a mechanism that provides 
A;-accuracy for any input value k when costs are correlated with data, and there is no bound on players' 
cost of privacy. We get around the lower bound of ||8] by using an indirect mechanism, and modeling the 
correlation of values and data via publically known (and allowably unbounded) distributions. 

'We can view this cost as due to the change in his utihty from future events that depend on the answer he gives to the analyst. 
This cost is approximately linear in e and his expected utility, denoted by Vi. Let g{A{D)) be the distribution of future events that 
depends on A{D). Let Wi be the player i's utility for future events. Since A is e-differentially private, g o A is also e-differentially 
private. Thus, for random variables y ~ g(A(D)) and y' ~ g(A(D')) and event 6, Pr[i/ = b] < e'^ Pr[y' = Therefore, we have 

Ey^g{A{D))['Wi{y)] - Ey^g(^A{D'))['Wi{y)] is approximately tEy^g(A{D'))[wi{y)] or -eEy^gi^A{D'))[wi(y)], when eis small. 
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Privacy Issues when Costs are Correlated with Data. The objective of a privacy preserving mechanism 
is that the increase in knowledge about a player's data due to output of the mechanism is small. Previous 
work on privacy in statistical databases assumes that the mechanism is associated with the database, such 
that the mechanism can access the whole database without compromising a player's privacy. However, if 
the mechanism is separated from the database, then a player might not trust the mechanism and might not 
want to reveal private information to the mechanism. 

In our problem, in order to estimate ni, the mechanism should learn information about players' data. 
Suppose that the mechanism has a prior belief G about the data in D. That is, the mechanism believes that 
the probability of Di = j is Pre [Di = j] according to the prior belief. The mechanism learns about Di if 
the mechanism believes that Pr[L'j = j] ^ PvclDi = j] after running the mechanism, for some j. There 
are two possible ways to learn about players' data. The first way is to read Di explicitly. The second way 
is to read players' actions and deduce something about their Di. For example, if the mechanism is direct 
and truthful, then the players report Vi truthfully. Suppose that the prior belief is that every player's data are 
drawn from a uniform distribution. That is, Fr G[Di = j] is the same for all i and j. If Fj{vi) < Fji{vi) 
for some j and /, and player i truthfully reports Vi, then the mechanism's posterior belief is that Pr[Z)j = 
j] < Fi[Di = j'], which is different from the prior belief. Learning anything about a player's data may 
compromise a player's privacy and should be compensated. Thus, there are two kinds of cost to a player that 
should be compensated, one is for using the player's data and one is for learning about the player's data. 

For the latter cost, we propose the concept of perfect data privacy, which is inspired by the concept 
of perfect objective privacy introduced in [7 |. A mechanism satisfies perfect data privacy if whenever the 
mechanism's posterior belief about a player's data differs from its prior belief, the mechanism pays the 
player. Under perfect data privacy, mechanisms can learn about a player's cost, as long as that knowledge 
does not reveal anything about his data. However, for a perfectly data private mechanism, if the mechanism 
learns about a player's data, then the mechanism always compensates the player, even when the mechanism 
does not not use the player's data to compute the estimate. 

Our Main Contribution. We give a mechanism that is BIG, EIIR, 0(e^^)-accurate, perfectly data private, 
and e-differentially private. To achieve our privacy guarantees, we propose a posted-price-like mechanism, 
described in Section|3] Given the set of types of players and the distributions of costs, the mechanism writes 
a contract that offers a different expected payment for each type. Each player is offered this contract. If 
a player accepts the contract, then his payment is determined by his verifiable type and the payment for 
his type in the contract. The player's action is either to accept the contract or reject the contract. A player 
decides truthfully if a player with type j accepts the contract when evi < rj , where rj is the payment for type 
j in the contract. We prove that this posted-price-like mechanism is BIG, EIIR, 0(e^^)-accurate, perfectly 
data private, and e-differentially private. 

We seek a mechanism with a small payment. In Section |4l we define a benchmark for the expected 
payment of a mechanism and compare the expected payment of our mechanism to this benchmark in two 
different settings. When costs are non-negative, we show that our mechanism is close to the benchmark. 

We also prove a lower bound on the accuracy that a direct and data private mechanism can achieve in 
Section m 

1.1 Related Work 

Selling Privacy. Our paper is closely related to the privacy preserving mechanisms studied in fSl. In fSl, 
they extend the definition of e-differentially private algorithms to e-differentially private mechanisms. Under 
their definition of an e-differentially private mechanism, the randomness only comes from the mechanism. 
In our model, since we want to protect the privacy of the costs, which are drawn from distributions, our 
definition of an e-differentially private mechanism relies both on the distributions of the costs and the ran- 
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domness of the mechanism. 

Differential Privacy. A comprehensive survey of differential privacy appears in [4]. Most of the previous 
results are based on random perturbations of the output, and assume that the mechanism has the ability 
to access the whole database. If the mechanism cannot access the whole database, Chaudhuri et al. [1] 
and Klonowski et al. show that random sampling is enough to ensure differential privacy with high 
probability. That is, it is not necessary to add more noise to the output. 

Differential Privacy and Mechanism Design. McSherry et al. [13 ] use a privacy preserving algorithm 
as a tool to design an approximately dominant strategy truthful mechanism. Instead, we focus on treating 
senstive data as a commodity that can be sold. 

Privacy Concerns in Mechanisms. Traditional mechanism design theory focuses on drawing private infor- 
mation from players in order to compute a result. However, if players have privacy concerns, they may not 
want to reveal their information. Feigenbaum et al. ||7] study how to quantify the information leakage to the 
mechanism based on communication complexity. 

Xiao ifTSl quantifies the information leakage in a mechanism based on information theory. In his model, 
the outcome of a privacy preserving mechanism not only motivates the players to participate but also protects 
the private information of players. In independent work, Nissam et al. lfT6l and Chen et al. lO consider 
privacy issues in mechanism design in the context of elections and discrete facility location. 

Posted-Price Mechanisms. In a posted-price mechanism, player i is offered a price r,. If player i accepts 
that price, then i pays r j to get the allocation. Goldberg et al. |f9l show that the posted-price mechanism is 
collusion resistant. Moreover, the players do not need to know or report their private values precisely. They 
only decide to accept or reject the price. Chawla et al. ||2J point out that this could be useful in reducing the 
private information revealed to the mechanism. 

Revenue Maximization in Bayesian Mechanism Design. In a classic paper, Myerson |[T4l characterizes 
the optimal BIC selling mechanism to maximization the expected revenue. In procurement mechanisms, 
each player is a supplier and each player's production cost is private information. The auctioneer is the 
buyer and wants to minimize the expected payment. In the computer science literature, an early paper in 
this area characterizes the minimum-cost dominant strategy truthful auction to buy an s-t path in a graph [6 |. 
Since then, there has been considerable interest in both frugal mechanism design (buying a feasible set at 
low cost), and budget-constrained mechanism design (buying as good a set as possible subject to a budget). 
Our work can be seen as a generalization of these questions to the setting of bidders who are reluctant to 
reveal their costs, and the feasibility of a set depends on the private costs (via the correlation with data). 

2 Model and Lower Bound 
2.1 Model 

There is a database D G [/i]" and n players, where each player has data Di. Player i with Di = j has a 
private cost Vi drawn from a distribution with cumulative distribution function Fj. Note that this definition 
is different from the traditional definition of a Bayesian setting. In the traditional definition, the distribution 
of Vi is known to every player and the mechanism. In our definition, the mechanism and players know that 
each player's Vi is drawn from one of a set of distributions, but the particular distribution depends on the 
individual player's data, which is unknown to everyone but that player. 

The goal of our mechanism is to estimate ni based on D and determine the payment pi for every player 
i. A mechanism first specifies the set of possible actions Y that players can take. Then, based on players' 
actions and the database, the mechanism determines the estimate and payment. Formally, a mechanism is a 
function M : y" x [/i]" — )• M x M". The mechanism has an a priori belief G about the data in D. That is, the 
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mechanism believes that the probabiUty of Di = j is Pr^ [Di = j] . Recall that the mechanism learns about 
Di if, after running the mechanism, the mechanism believes that Pr[Dj = j] / PrG[L'j = j] for some j. 
We use a vector x G {0, 1}" to indicate whether the mechanism learns something about each player's data. 
If the mechanism learns about D^, then = 1. A mechanism is perfectly data private if, when Xi = \, 
player i's expected payment from the mechanism is at least evj. We focus on randomized mechanisms in 
this paper, that is, xi and payment pi are random variables. 

Next, we define the utility for a player. If = 1, there is a cost evi to player i, since something about 
Di is learned. For y G representing all players' actions, the utility for player i is Ui{y, Vi) = Pi — eXiVi, 
where = M{y, D). In this paper, we assume that players are rational, so players want to maximize 

their expected utilities. The strategy of player i is a function qi : R x [h] Y mapping from Vi and Di to 
an action. Since players want to maximize their expected utilities, they will take the action that is not worse 
than any other action. 

Finally, we introduce the solution concept. A profile of strategies gi, . . . , g„ is a Bayesian-Nash equi- 
librium if for all i, Vi, and y[ G Y, E[ui{q{vi,v^i,D),Vi)] > E[ui{(y'^,q^i{v^i, D^,i)),v,)], where the 
randomness is from the mechanism and the randomness of V-i. A direct mechanism is Bayesian incentive 
compatible (BIC) if qi{vi,Di) = is a Bayesian-Nash equilibrium for every player i. 

2.2 Lower Bound 

In order to ensure that players have incentive to participate the mechanism, we wish that the mechanism is 
individually rational. However, we can show that for any direct, BIC, and EIIR mechanism, there is a lower 
bound of accuracy. Since the condition of EIIR is weaker than EPIR, the lower bound for EIIR also implies 
a lower bound for EPIR mechanisms. 

Lemma 2.1. If the functions Fi are arbitrary functions with unbounded range, then for any k < n/2, no 
k-accurate, direct, BIC, EIIR, and perfectly data private mechanism exists. 

Proof. Suppose that M is a BIC, EIIR, perfectly data private, and fc-accurate mechanism. First, we show 
that M must access at least one player's cost or data. Assume that M does not access any cost or data. 
Thus, M randomly output an estimate s, which is independent of costs and data. For a database D^ with 

all entries equal to one, since M is fc-accurate, Pr[s G [n, n — k]] > |. Similarly, if a database L)^ has no 
entries equal to one, then Pr[s G [0, k]] > |. Because k < n/2, [n, n — k] and [0, k] do not overlap. But 
the summation of these two probabilities is greater than one, which is impossible. Hence, M must access at 
least one player's cost or data. 

Suppose that Di G {1,2} and Fi{v) ^ F2(v) for all v. For any v, if M access Vi = v, then the 
mechanism must pay player i, since Fi(v) = Fr[vi = v\Di = j] / Pr[t;j = v\Di = j'] = F2(v) and 
M is perfectly data private. Let Xi be the indicator random variable representing whether player i's cost is 
accessed. Let pj be the random variable representing player i's payment. Since M is BIC, we suppose that 
players other than i report truthfully. Since the mechanism decides to access Vi based on v-i, Pr[xj = 1] 
is independent of Vi. Because M must access at least one player's cost, we can find a player i, such that 
Pr[a;j = 1] > 0. For a fixed Vi, the expected utility of i is E\pi] — eviE[xi]. Since the range of F is 
unbounded, we can find another v'^ > -^^j- Since M is EIIR, we have E\p'^ > ev'j^E[xi]. Thus, for player 
i with cost Vi, if i overbids v'^, the utility is E\p'-] — eViE[xi] > ev'^E[xi] — eViE[xi] > E\pi] — eViE[xi]. 
Thus, player i can increase expected utility by overbidding. Hence, M is not BIC. □ 

Our mechanism, which is explained in the next section, is an indirect mechanism since it does not ask for 

players' costs. The revelation principle, which states that if there exists an indirect mechanism implementing 
a function in Bayesian-Nash equilibrium, then there also exists a direct BIC mechanism implementing the 
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same function, is irrelevant under the desire for perfect data privacy. It is easy to construct a direct mech- 
anism from our indirect mechanism. However, this direct mechanism accesses all players' data without 
compensating all players. Thus, this direct mechanism is not perfectly data private. 

e-Differential Privacy. The traditional definition of e-differential privacy compares the outcomes of the 
algorithm applied to adjacent databases. However, with a mechanism that offers payments, the mechanism 
may use both the database and the replies to the mechanism to compute an estimate and payments. Since 
replies depend on the individuals' costs, we compare the outcomes of the mechanism applied to two cost- 
data pairs {v,D) and {v',D'). A cost vector v = {vi, . . . ,Vn) is drawn according to a database D, if 
Vi is drawn from Fj, where Di = j. Two cost-data pairs {v,D) and {v',D') are adjacent, if D and D' 
differ only in the i-th entry and v and v' are independently drawn according to database D and D'. A BIC 
mechanism is e-differentially private if, for any pair of adjacent cost-data pairs, the estimate and payments 
satisfy e-differential privacy. 

Bayesian Assumptions. Our definition of e-differential privacy is based on the common belief F. That is, 
the player decides his strategy assuming that other players' costs are drawn from F and all players believe 
this assumption. If a player allows his data to be used, then he may incur a expected cost evi. The expected 
cost to the player depends on e and thus also depends on the common belief F. Having a common belief 
is a traditional assumption in the Bayesian setting. Moreover, most BIC mechanisms become meaningless 
when the common belief is not true. Thus, we assume that the common belief F is correct. 

3 Mechanism 

In this section, we give a perfectly data private, BIC, EIIR, e-differentially private, and 0(e~^)-accurate 
mechanism. Every player i has data Di G [h]. To start, we assume that Fj is continuous for j e {1, 2}. 

The mechanism designs and offers contracts to players. The contract guarantees an expected payment to 
each player who accepts the contract. The players decide to accept or reject the contract. Thus, the possible 
actions for players are "accept" or "reject". The mechanism uses the data of players who accept the contract 
to estimate ni. The estimate is unbiased if the expected value of the estimate is ni. To obtain an unbiased 
estimate, the set of players who accept the contract should be unbiased, that is, the probability of a player 
accepting the contract should be equal for all players. Moreover, since the mechanism pays players, the 
costs of players in the accepting set should be bounded. 

The mechanism first finds aj for j G [h], such that Fj{aj) = c, where c will be determined later Then, 
each player i is given a contract : "If Di = j, your expected payment will be eaj." A player i with Di = j 
decides truthfully if, when Vi < aj, player i accepts the contract and rejects otherwise. Let W be the set 
of players who accept the contract. If all players decide truthfully, the cost to each player in W is bounded 
by maxj aj. Since for player i with Di = j, PT[vi < aj] = a, every player accepts the contract with 
probability c. Thus, W is an unbiased and cost-bounded sample set. 

Since the probability that a player accepts the contract is c, the value m := \{i ^ W : Di = 1}\ is a. 
random variable bin(ni, c) from a binomial distributioijlBin(ni, c). Since the expected value of m is cni, 
is an unbiased estimate of ni. We say ^ is a naive estimate of ni. 

We explain how to produce an estimate that satisfies e-differential privacy. Although the naive estimate 
is an unbiased estimate of ni, it does not satisfy differential privacy. Consider an adjacent pairs of cost-data 
pairs {v, D) and {v' , D'), where D and D' differ in the i-th entry. Let ni be the number of player i with 
Di = 1 and n[ be the number of players i with D'- = I. The naive estimate does not satisfy differential 

binomial distribution with parameter n and p is denoted by Bin(n,p). The probability density function of B'm{n,p) is 
f{k;n,p) — — Letbin(n,p) denote a random variable drawn from Bin(n,p). The expected value of bin(n, p) is 

np and variance is np(l — p). 
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Mechanism 1: e-differentially private mechanism 
input : privacy parameter e; cost distributions Fj, j G [h] 
output: estimate s\ payment p 

1 Pick a real number c G (0, 1) 

2 Find ttj for all j G [h], such that Fj{aj) = c. 

3 For each player i, offer a contract: 

4 If = j, the expected payment will be eaj. 

5 Let W = {i : i accepts contract}. 

6 Letm=\{i eW : Di = 

7 Lets = i(m + lap(i)). 

8 s = s if s G [0, n] , if s < 0, nif s > n 

fo if i4W 

9 Pi = i 

I e{aj + lap(^)), where 7 := | maxj aj — miiij aj] if i G and Di = j 
10 return 



privacy, since if Z)j = 1 and < eai, then an outsider can infer Di easily by comparing the naive estimates 
of ni and n'^. Thus, we should introduce a random noise to the naive estimate to satisfy differential privacy. 

The mechanism uses the Laplacian distribution as a source of the random noise. The Laplacian noise is 
commonly used to obtain differential privacy. A Laplacian distribution with mean and parameter 6 > is 
denoted by Lap(5). The probability density function of Lap(6) is 

/(x) = ^exp(-My 

Let lap(6) denote a random variable drawn from Lap (6). 

In order to make estimate satisfy differential privacy, the mechanism adds random noise lap(^) to the 
naive estimate. Since the mean of the Laplacian noise is zero, s = ^(m + lap(i)) is an unbiased estimate 
of ni. However, s might be larger than n or be negative, both of which are meaningless. We truncate s to 
get s, that is when s > n, the mechanism outputs n and when s < 0, the mechanism outputs 0. 

We also use the Laplacian noises to produce payments that satisfy e-differential privacy. By the con- 
struction of the contract, for any player i with Di = j who accepts the contract, the mechanism pays player 
i for euj in expectation. If the mechanism pays player i for eoj deterministically, then an outsider can infer 
player z's data easily. Thus, we should introduce noise to the payments. We add noise e lap(^) to the pay- 
ment, where 7 := | maxj Oj — minj aj\. Thus, pi = e{aj + lap(^)). Since the expected value of lap(-) is 
zero, the expected payment of player i is eaj, which satisfies the guarantee in the contract. Moreover, since 
eaj is larger than evi, the mechanism is EIIR. The formal description of the mechanism is in Mechanism [T] 

Lemma 3.1. Mechanism 1 is perfectly data private. 

Proof. Let yi be player i's reply to the contract. By construction of the contract, if i decides truthfully, then 
Pr[yj = ^^accept" \ Di = j] = a for all j G [h]. That is, the probability of accepting the contract and 
Di are independent. Thus, for any i, the mechanism cannot learn about Di by reading yi. Moreover, the 
mechanism only reads Di, where i G W. Since player i ^ W with Di = j is paid eaj in expectation and 
Vi < aj, the mechanism satisfies the requirement. □ 

Lemma 3.2. Mechanism 1 is BIC and EIIR. 
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Proof. (BIC) The payments for players who is not in W are always 0. For player i, there are two cases. 
Case 1: Di = j and vi < aj. Accepting the contract will get expected payment e{aj — Vi) > 0. 
Case 2: Di = j and Vi > aj. Accepting the contract will get expected payment e{aj — vi) < 0. 

(EIIR) Suppose that every player decides truthfully. Then only players with Vi < aj and Di = j for 
some j are in W. Since the expected payment for i with Di = j is eoj, the expected utility of the player is 
non-negative. □ 

Two random variables xi and X2 are e-mutually bounded, if V/ C M, Pr[xi G /] < e*^ Pr[x2 € /] and 
Pr[x2 e /] < e'=Pr[xi G /]. 

Lemma 3.3 (Fact 2 in ||8l). Ifxi and X2 are e-mutually bounded and f is a function, then f{xi) and f{x2) 
are also e-mutually bounded. □ 

Lemma 3.4 (||5l). Let xi and X2 be two random variables. If\xi—X2\ < k, then xi+lap{^) and X2+lap{^) 
are e-mutually bounded. □ 

The next two lemmas address the e-differential privacy of the payment and the estimate. Let {v, D) and 
{v' , D') be adjacent cost-data pairs. Let {s,p) and {s',p') be the results for {v, D) and {v\ D') respectively. 

Lemma 3.5. For any / CM, Pr[s € I] < e" Pv[s' G /]. 

Proof. Without loss of generality, we assume that 1 = Di and !)■ / 1. First, Pr[s G /] = gRn-i £ 

/ I v^i] Pi:[v-i]dv-t. Similarly, Pr[s' G /] = X,_.g]gn-i Pr[s' G / | v-i] Pi:[v-i]dv^i. Let and q'^ be 

two random variables, which are equal to s and s' when v-i = w. If q^ and q'^ are e-mutually bounded for 
all w, then s and s' are e-mutually bounded, since then 



Pr[s £l]= [ 


Pi[s G / V-i = w] Pr[?;_j = 




|n-l 


-j 


Pr[q^ G /] Pr['t;_i = w\dw 


Jwei 


<-l 


Pt[^^ G /] Pr[w_j = w\dw 




|n-l 




e*^ Pr[s' G / V-i = vo\ PY[v-i 







The case Pr[s' G /] < e*^ Pr[s G /] can be shown by a symmetric argument. 

Here, we show that q^j and are e-mutually bounded for all w. Fix v-i = w. Let Ww and W'yj 
be the sets of players accepting the contract when applying the algorithm to inputs {v,D) and {v\D') 
respectively. Let niyj := \{i : = l,i G Ww}\ and m'^ := \{i : D'^ = l,i e W^}\. When applying 
the mechanism to inputs {v,D) and {v',D'), the mechanism computes Sw = ^{itiuj + lap(^)) and = 
^(m'^ + lap(7)) respectively. Then, the mechanism truncates s^j and s'^ to get s^j and s'^. By Lemma [331 
since multiplication and truncation are functions, it suffices to show that + lap(^) and m'^ + lap(i) are 
e-mutually bounded when v^i = w. Since W \ W is either the empty set or {i}, the difference between 
ruw and m'^ is at most one. Thus, Lemma implies that + lap(i) and m'^ + lap(i) are e-mutually 
bounded. Thus, and q'^ are e-mutually bounded for all w, and hence s and s' are mutually bounded. □ 

Lemma 3.6. For all i G [n] and for all / CM, Pr[pi G /] < Pr[p- G /]. 
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Proof. Without loss of generality, we assume that Di = 1 and D[ ^ 1. For player j ^ i, if j ^ W, the 
payment is zero. If j £ W, the payment to j depends only on the data Dj and does not depend on the 
set of players receiving payments. Thus, pj does not change and we only need to consider pi. Note that 
Pi only happens if player i is in W. If pi ^ 0, then pi is a random variable = e(ai + lap(^)). 
Thus, for any / C M \ {0}, the probability Pv[pi € /] = cPt[P^ € /], where c is the probability of that a 
player accepts the contract. The probability Pv[pi = 0] = (1 — c) + cPr[P^ = 0]. Suppose that D- = j'. 
Symmetrically, let = g(Q,^., _^ lap(^)), for any / C M \ {0}, the probability Pr[p- £ I] = cPv[P'^ G /] 
and Pr[p^ = 0] = (1 - c) + cPr[p2 = o]. 

Thus, it suffices to show that P^ and P^ are e-mutually bounded. By Lemma [331 since multiplication 
is a function, it is sufficient to show that ai + lap(^) and aji + lap(^) are e-mutually bounded. By Lemma 
13. 4[ since \ai — aji\ < 7, ai + lap(^) and aj/ + lap(^) are e-mutually bounded. □ 



Lemma 3.7. Mechanism 1 is y 3(!ll(l_£i -)_ accurate. 

Proof. Since the error term |s — ni| is smaller than |s — ni|, we can analyze |s — ni| to get a bound on the 
error. Since £'[m] = cni, E[s] = -{E[m] + £'[lap(i)]) = ni by linearity of expectation. 



, , , 1, . /In 

Is — nil < |s — nil = -|m + lap(-j 
c e 



nic| = -|bin(ni, c) + lap(-) — £'[bin(ni, c) + lap(-) 



c e 

In order to prove that accuracy with high probability, we use Chebyshev's inequality. 

Lemma 3.8 (Chebyshev's inequality). Let X be a random variable with expected value fj, and variance cj^. 
For any real number k > 0, Pi[\X — fi\ > ka] < p-. 

We set = a/3 and let X ~ bin(ni, c) + lap(i) with Var[X] = nic(l - c) + 4 to get 



Pr 



1 1 I 2~ 

|bin(ni,c) +lap(-) - ^[bin(ni, c) + lap(-)]| > W3(nic(l - c) + ^) 
e e V e^ 



< 



1 



This is equivalent to 



Pr 



-|bin(ni,c) +lap(-) - S[bin(ni,c) +lap(-)]| > W3 
c e e 



ni(l 



+ 



1 

< -. 

- 3 



Thus, Pr 



+ 



r) 



< 



3- 



n(l-c) 



< 



□ 



r, the 



The mechanism can pick c freely. If the mechanism picks a constant c such that 
mechanism is 0(e^^) accurate. 

We will extend this result to general data entry and discrete cost distributions in Section [XT] Thus, we 
have the main theorem. 

Theorem 3.9. Mechanism 1 is BIC, EUR, 0{e^^)-accurate, perfectly data private, and e-dijferentially 
private. □ 
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3.1 Extensions and Computational Issues 

General Database Entries. Suppose that the entry of database has d attributes, that is, Di G [h]'^. Given a 
sequence ai, . . . , a^, where aj € [h], the data analyst wants to estimate \{i : ^jDij = aj}\. For any Di, we 
can transform Di to a single attribute data D'- = 1 + YliZo ^ij ^ ^^^^ ^^^^ ^ i^'^]- Then, we can 
apply the mechanism to estimate the number of players with D'- = 1 + Yli=o ^ 

Discrete Cost Distributions. When Fj is a discrete probability function, the major difficulty is that for 
a given c and j, we may not find a suitable Oj, such that Fj{aj) = c, because the cumulative probability 
function of a discrete distribution is a step function. However, the mechanism can provide different contracts 
to different players and this ability allows us to design a mechanism for discrete case. 

The basic idea is that the mechanism uses randomness to pick aj such that every player has equal 
probability c to accept the contract. For a given c and for each j, if there is no aj such that Fj (aj) = c, then 
the mechanism finds the largest a~ and the smallest such that Fj {aj ) = cj < c and Fj (a^) = > c. 
Note that a player i with Di = j accepts the contract if his cost is smaller than the expected payment. If 
the expected payment is a^, then the player accepts the contract with probability > c. On the other 
hand, if the expected payment is aJ , then the player accepts the contract with probability cJ < c. Let 

C— C 

/3j = ^ ^_ . Player i is given a contract "If Di = j, your expected payment is eaj in expectation," where 

Pr[aj = aJ] = 1 — (3j and Pr[aj = a^] = /3j. Thus, Fi[vi < aj] = cJ + Pj{Cj — cJ) = c, where the 
randomness is over the distribution of costs and the random choice of aj. We can prove that the mechanism 
is perfectly data private, BIC, and EIIR by arguments similar to those in the proofs of Lemmas |3 . 1 1 and IJ!2] 
Since every player has equal probability c to accept the contract, we can show that the mechanism satisfies 
e-differential privacy of estimate and is 0(e~^) -accurate by arguments similar to those in the proofs of 
Lemmas 13.51 and [3771 In order to satisfy differential privacy of payments, we let 7 := maxj — minj aJ . 
Then, the payments satisfy e-differential privacy by an argument similar to the proof of Lemma 1331 

Cost of Mechanism. For a fixed e, when c increases, the accuracy of Mechanism is improved, since Mech- 
anism 1 uses more players' data. However, Mechanism I's expected total payment also increases. Since 

Mechanism 1 ) -accurate, there is a trade-off between the accuracy and the expected 

total payment. Since the mechanism can pick c freely, for a given e > the mechanism can pick 

c = ^^^^ ^'^ — Let a = maxjOj. The expected total payment is eacn = an( ^"^^'^^~^^" ). Then, 
Mechanism 1 picks a suitable e, such that the expected total payment eacn = B. Hence, the mechanism is 
budget-feasible in expectation and isO(^) = 0(^) accurate. 

Fixed Accuracy. If the data analyst wants a /c-accurate mechanism, we can pick c = x+fc^/6n ^ ~ 

'^^^^^^ such that the mechanism is /c-accurate. The expected total payment is eacn = 

Computing F^-'^(c). In an ideal model, when Fj is a continuous distribution, we assume that mechanism 
can access the closed form of Fj, such that the mechanism can compute a = F^^{c) accurately. However, 
when the mechanism cannot access the closed form of Fj, F~^{c) may not be computable. When it is 
impossible to access the closed form of Fj, we assume that there is an oracle, which returns Fj{v) for 
any given value v. In the oracle model, the mechanism finds aJ, for all j, such that Fj{aj) < c, 
Fj{aj') > c, and at — aJ < 6 for 6 < 1/n using binary search. Then, the mechanism uses the method 

that we use for discrete cost distributions to construct the contract. That is, let f3j = _^ ^_ . Player i is 

given a contract "If Di = j, your expected payment is eaj in expectation," where Pr[aj = aj] = 1 — Pj 
and Pr[Q;j = a^] = f3j. Thus, Pr[uj < aj] = c~ + I3j{c^ — c~) = c, where the randomness is over the 
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distribution of costs and the random choice of aj. Hence, the mechanism is still perfectly data private, BIC, 
EllR, e-differential private, and 0(l/e)-accurate. In the oracle model, the expected payment for player i 
with Di = j who accepts the contract is at most at. In the ideal model, the expected payment for player i 
with Di = j who accepts the contract is F^^{c), which is smaller than at. Since at — F~^{c) is at most 
5 < 1/n, the difference between the expected payments in the ideal model and in the oracle model is at 
most 1 /n for each player. Thus, the difference between the expected total payment in the ideal model and 
in the oracle model is at most 1. 

4 Optimality 

In this section, we define a benchmark for the expected payment of a mechanism and compare the expected 
payment of Mechanism 1 to this benchmark in two different settings. The benchmark mechanism is not 
only truthful but also knows Di for all i and has no privacy requirements. We show that when all costs are 
non-negative, Mechanism 1 is provably close to the benchmark. 

The benchmark is the minimum expected payment among all truthful mechanisms M* that satisfy the 
following properties. In order to get a meaningful estimate, for any k < n/2, a fe-accurate mechanism 
learns a subset of players' data. We call this subset a sample set. Since obtaining an estimate based on an 
unbiased sample is a common approach in statistics, we assume that M* uses an unbiased sample. Suppose 
that there are Uj players with Di = j for j G [h]. Since the sample set is unbiased, there exists c such that 
M* buys Wj = cuj data from players with Di = j. After getting an unbiased sample, M* uses wi/cas the 
straightforward estimate of ni. Since the choices of c may effect the accuracy guarantee, we compare the 
payment of Mechanism 1 to the payment of M*, where Mechanism 1 and M* have the same size of sample 
sets. Thus, M* is a truthful mechanism that gets an unbiased sample with size cn for a fixed number c. 

Since there is no competition between players with data j and players with data / ^ j, M* can run 
auctions for players with Di = j for all j G [h] independently and buy Wj data from players with Di = j. 
The mechanism that guarantees buying w units is called w-unit procurement mechanism. Thus, M* is a 
mechanism that runs a truthful, wj -unit procurement mechanism for each j G [h]. 

Mechanism 1 buys in expectation Wj data from players with Di = j for j G [h]. We compare the 
expected payment of Mechanism 1 for buying in expectation wj data from players with Di = j with the 
expected payment of M* for buying wj data from players with Di = j for each j. If the expected payment 
of Mechanism 1 is at most r times the expected payment of M* for each j, then the total expected payment 
of Mechanism 1 is at most r times the total expected payment of M*. Thus, we focus on a single auction 
that all players have the same Di and both Mechanism 1 and the M* want to buy w data from n players. 

For multi-unit procurement mechanisms, let Xi be the indicator random variable denoting whether the 
mechanism buys from player i. Let Vi be the cost to the player i, if .Xj = 1. Let Pi be the payment of player 
i. The utility for player i is pi — XiVi. Note that when we consider privacy preserving mechanisms, the 
utility of player i is pi — exiVi. However, since e is the same for all players, we can ignore e without loss 
of generaUty, that is, scaling every player's cost by e. Without loss of generality, we suppose that players 
report costs vi < V2 ■ ■ ■ < Vn- 

4.1 Envy-free Benchmark 

A mechanism is envy-free if for all and for all i, j, pi — ViXi > pj — ViXj. We show that for any 
envy-free, multi-unit procurement mechanism, every data that is bought by the mechanism is purchased 
at the same price. Suppose that a multi-unit procurement mechanism buys data from two players at two 
different prices. Since the player with the lower price wants to have the higher price, the mechanism is not 
envy-free. We compare the expected payment of Mechanism 1 with the expected payment of the optimal, 
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envy-free, dominant strategy truthful, multi-unit procurement mechanism. We use envy-free mechanisms 
as a benchmark, because for procurement mechanisms in a Bayesian setting, the optimal mechanisms are 
known to charge a fixed priceHl 

We introduce another commonly used solution concept as follows. A profile of strategies qi, ... ,qn is a. 
dominant strategy equilibrium if for all i, Vi,V-i, and y'- £ Y, E[ui{q{vi,V-i, D),Vi)] > E[ui{{y'-, q-i{v-i, D.j)), 
where the randomness is from the mechanism. A direct mechanism is dominant strategy truthful if qi{vi, Di) = 
Vi is a dominant strategy equilibrium for every player i. The following lemma characterizes the total pay- 
ment for any dominant strategy truthful, EPIR, and envy-free mechanisms. 

Lemma 4.1 (Theorem 4.6 in El). No dominant strategy truthful, EPIR, and envy-free w-unit procurement 
mechanism can have total payment less than wv^j^i. □ 

Let F be the cumulative distribution function of players' costs, that is, F{a) = Pr[t; < a\. By Lemma 
14.11 the total expected payment of any dominant strategy truthful, EPIR, and envy-free tf-unit procurement 
mechanism is at least wEv^p[vw+i]- Thus, our benchmark is wEvr^p[vw+i\- 

Now, we compare the benchmark with the expected payment of Mechanism 1. There are two cases. 
First, when there exists a such that F{a) = ^, Mechanism 1 offers a posted price a for each player in order 
to buy w players' data in expectation. If player i accepts the price, the mechanism buys from player i with 
expected payment a. Since each player has probabihty ^ to accept the contract, the total expected payment 
of Mechanism 1 is wa. 

Second, when there is no a such that F{a) = ^, we give an extension to Mechanism 1 in Section 
13.11 The extension finds the largest and the smallest , such that F{a^) < ^ and F{a^) > ^. Let 

c~ := F{a~), := F{a~^), and (3 := . Then, the mechanism offers a price a+ with probability 

/3 and price a~ with probability 1 — /3. For a player with cost at most a~, since the player always accepts 
the offer, the expected payment is (a~(l — /?) + a+/3). For a player with cost equal to a+, since the 
player accepts the offer only when the offered price is a"*", the expected payment is a~^f]. For a player 
with cost larger than q+, since the player always rejects the offer, the expected payment is 0. Since each 
player has a cost at most a~ with probability c~ and has a cost equal to a+ with probability c+ — c~ , each 
player's expected payment is c~ (a^ (1 — /3) + + (c+ — )a+/3. Thus, the total expected payment is 
n(c~(a~(l — /?) + + (c+ — c~)a+/3) by the linearity of expectation. Moreover, 

n(c-(a-(l - /3) + a+/3) + (c+ - c")a+/3) = n(c"(a"(l - /3) + Q+/3) + ( c")a+) 

n 

= n(— a+ + c~ (a~ (1 - /3) + a+/3 - a+)) 
n 

= n(-a+ + c-((l - /3)(a- - a+)) 
n 
in 

= n(-a+ - c-((l - /3)(q+ - a"))) 
n 

= u;(a+ - —(1 - /3)(a+ - a")). 
w 

When there exists a, such that F(a) = ^, the expected payment of Mechanism 1 is wa. When a does 
not exist, the expected payment is w{p& — ^^(1 — /3)(a"'" — Thus, we should compare both wa and 

w{p& — ^^^(1 — /3)(a+ — a~)) with wE^^p\vw^{\- It suffices to compare a and q+ — ^^^(1— /3)(a"'" — a~) 
with E^r^F\vw+^- 

Lemma 4.2. I. If there exists a such that F{a) = ^, then E^^p[vu,+i\ ^ ^ct- 

2. If there is no a such that F{a) = ^, then E^^p[vyjj^i] > ^(a"*" — ^c~(l — — Q^)). 

^Envy-free benchmarks are also common in prior-free mechanism design 1101 . 
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Proof. We show the second statement. The first statement follows by setting a = Q!+ = a. 
Let 7] = — ^^(1 — I3){ct^ — a^)- By conditional probability, 

E[vw+i] = Pr[?;^+i < ??] X E[vu,+i \ Vyj^i < r/] + Pr[?;^+i > r/] x E[vyj^i \ v^+i > rj] 
> Pr[t)^-|-i > 7]] X T] (costs are non-negative). 

It suffices to show that Pr[t'^+i >■>]]> ^. Since < ^, ^ < 1. Since (3 < 1 and ^ < 1, 
a~ < rj < a+. If v^+i > tj, then f^+i > a^, since a+ is the smallest number larger than a~ with 
non-zero probability. Let denote the cost of player i. If Vyj^i > then at most w players' t'(j) 
are no larger than . Since each is independently drawn from F, Pr[?;(j) < a~] = c~. Let Xi be 
the indicator random variable such that = 1 if < , otherwise Xi = 0. Let X = Yl^=i -^i- The 
probability that at most w players have u (j) no larger than a~ is Pr[X < w]. Since the Xj's are independent, 
identical, indicator random variables, X is a random variable from a binomial distribution Bin(n, c~ ). Thus, 
Fi[vui+i > a"*"] = Pr[bin(n,c~) < w]. 

Now, we show that Pr[bin(n,c" ') < w] > \- We say m is the median of a distribution D over real 
numbers if, Pr[Z < m] > i and Pr[Z > m] > i, where Z is a random variable drawn from D. For a 
binomial distribution Bin(n,p), the expected value np and the median m satisfy [np\ < m < [np] [11]. 
Since < ^, the expected value of bin(n, c~) is smaller than w. Since \nc~~\ < w, the median m of 
Bin(n, c^) is at most w. Thus, Pr[bin(n, c~) < w] > ^. □ 

Lemmas |4. 1 1 and I42] imply the following theorem. 

Theorem 4.3. Mechanism 1 's expected payment is 2-approximate to the benchmark. □ 

4.2 Anti-regular Distributions 

In this section, we compare the expected payment of Mechanism 1 with the expected payment of the optimal 
BIC, multi-unit procurement mechanism. We first characterize randomized BIC procurement mechanisms. 
For a randomized mechanism and a given bid Vi, let Xi{vi) be the probability that the mechanism buys from 
player i and let pi{vi) be the random variable denoting the payment for player i, where both Xi and pj's 
randomness come from the mechanism and Suppose that when Vi = oo, the mechanism will not buy 
from player i. That is, Xi{oo) = and E[pi{oo)] = 0. The characterization for the BIC, procurement 
mechanisms is analogous to the characterization of BIC selling mechanisms, which is a well-known result 
in auction theory. We provide a proof of the following characterization in the Appendix. 

Lemma 4.4. A randomized procurement mechanism is BIC if and only if for every i the procurement prob- 
ability X and payment p satisfies 

(i) Xi{vi) is decreasing in vi; 

(ii) E\pi{vi)] = ViXi{vi) + Xi{t)dt. □ 

To prove the optimality of selling mechanisms, Myerson fT45 introduces a virtual value function. The 
analogous function for procurement mechanisms is a virtual cost function, which is (j){z) := z + Thus, 
to ensure that (l){z) is well-defined and the integral of / is well-defined (used in the proof of Lemma 1431 and 
Lemma [4771 ). we assume 

Assumption 1. Let f be the density probability function of distribution F with range [a, b] C [0, oo). / is 
piecewise continuous and f{z) is positive for all z G [a, b]. 

A distribution F is anti-regular if F satisfies Assumption 1 and 0(z) is increasing in z^ 

''For selling mechanisms, a distribution is regular if the virtual value (j) {z) = z — "'"'^f is increasing in z. 
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When the distribution F is anti-regular, |'6l characterize the optimal dominant strategy truthful mecha- 
nism to minimize the expected payment for path auctions. Although their problem is not exactly the same 
as u;-unit procurement mechanisms, their result can be extended to procurement mechanisms easily. For 
completeness, we provide a proof of the following lemma for our setting in the Appendix. 

Lemma 4.5. When the distribution F is anti-regular, the optimal BIC w-unit procurement buys from the w 
players with the smallest virtual cost. □ 

Since (j){z) is increasing in z, the optimal mechanism buys from the first w players. By Lemma l44l 
the expected payment for player z < is I'ui+i- Thus, the total expected payment of the optimal BIC 
mechanism is wEy,^p[vwj^i\. Thus, our benchmark is wEv,^p[vwJ^i]. We compare the expected payment 
of Mechanism 1 with wE^^p[vwJ^i\, when F is anti-regular. 

Theorem 4.6. When F is anti-regular, Mechanism 1 's expected payment is 2-approximate to the benchmark. 

Proof. Since F satisfies Assumption 1 by definition of anti-regular, F~^ is well-defined. The total expected 
payment of Mechanism 1 is wF~^{—). When F is anti-regular, the benchmark is wEyr^p[vyjj^i\. By 
Lemma |42l Mechanism 1 is 2-approximate. □ 



4.3 General Distributions 

When the distribution satisfies Assumption 1 but (l){z) is not increasing in z, buying from the w players with 
smallest virtual cost may result in a non-truthful mechanism. We can use the ironing procedure, which is 
designed by Myerson llT4l . to resolve this issue. For a fixed cost vector v, ironing procedure irons on interval 
[a, b), if Vi e [a, b), then Vi is replaced by a random number v[, which is drawn from the distribution F on 
[a, 6). By a way similar to Myerson's method, we can identify a set S of intervals, such that the ironed 
virtual cost function (f){z) = E[(p{z)] is increasing in z. Moreover, for an ironed interval [a, 6), (j){z) is the 
same for all z G [a, 6). The formal definitions of the ironed interval set S and ironed virtual cost function 
are in the appendix. 

Lemma 4.7. The w-unit procurement mechanism that buys from the w players with smallest ironed virtual 
cost and breaks ties uniformly at random is the optimal BIC mechanism when the distribution satisfies 
Assumption 1. □ 

Thus, our benchmark is the expected payment of the optimal BIC mechanism, M, when the distribution 
satisfies Assumption 1 . In order to calculate the expected payment of M, we specify the payment rule as 
follows. Let Xi{vi, v^i) be the probability that M buys from player i, where the randomness comes from the 
mechanism. Since M buys from the w players with smallest ironed virtual cost, Xi{vi,V-i) is decreasing 
in Vi for any fixed Let pi{vi,v-i) be the random variable denoting the payment for player i, where 
E\pi{vi,V-i)\ = ViXi{vi,v^i) + Xi{t, v^i)dt and the randomness comes from the mechanism. It is easy 
to see that this payment rule satisfies Lemma l44l 

We compare the expected payment of Mechanism 1 with the benchmark. 

Theorem 4.8. Let F satisfy Assumption 1. Let S be the set of ironed intervals for F. If every interval 
[a, b) £ S satisfies a > b/r for some r > 1, then the expected payment of Mechanism 1 is 2r -approximate 
to the benchmark. 

Proof. Since F satisfies Assumption 1, F^^ is well-defined. The expected payment of Mechanism 1 is 
wF~^{^). We compare the expected payment of the optimal BIC mechanism, M, with wF~^{^). Let 
Pi{v) be the random variable representing the payment for player i in M when the cost vector is v. We 
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show that ^^^i7',MEie[n]?'*(^)] ^ Ey^F[vuj+i]/r, which implies E^^F,M[J2i(^[n]Pi(.^)] ^ ^ H^)/2'^ 
by Lemma |42l and hence Mechanism 1 is 2r-approximate. 

There are two sources of randomness in mechanism M. One is from the cost vector v since v is drawn 

from a distribution F. Another one is M itself since M is a randomized mechanism. For a fixed cost vector 

V, let be the random variable representing the payment for player i, where the randomness only comes 

from M. We show that for any fixed v, EM[^i(z[n] Pi] > wv^+i/r. This implies E^r^F,M[Ylie[n] Pii'")] > 

Ev^F[vw+i]/r. There are three cases. 

Case 1: v^+i is not in any ironed interval. Since M chooses the w players with smallest ironed virtual costs 
and the ironed virtual cost is increasing, M buys from the first w players. For player i < w,if Vi increases 
to t < v^^i, by the monotonicity of cj), the mechanism still buys from player i. That is Xi{t, V-i) = 1 for all 
t < v^-^-l. When t > the mechanism will not buy from player i. Thus, by definition of the expected 

payment, the expected payment for each player i < w is Vi + J°° Xi{t, V-i)dt = Vi + J'"^+'^ Xi{t, V-i)dt = 
Vw+i- Since expected payment for player i > wisQ, £^M[Z]je[n] Pi] — w'^w+i- 

Case 2: v^+i is in an ironed interval [a, b) but ^ [a, h). Since for all player i < w, Vi ^ [o, b), M 
buys from the first w players. For player i < w, Xi{t, v-i) = 1 for all f < a. By definition of the expected 
payment, the expected payment for player i <wisvi + Xi{t, v-i)dt >Vi + Xi{t, V-i)dt = a. Thus, 
^M[Ei(zin]Pi] >wa> wb/r > wvu,+i/r. 

Case 3: v^+i and are in the same ironed interval [a, 6). Let li = \{i : Vi < a}\ and I2 = \{i : Vi E 
[a, 6)}|. Thus, /i < it;and/i+/2 > w. The mechanism always buys from the first /i players. Since is the 
same for all t £ [a,b) and the mechanism breaks ties uniformly at random, the mechanism buys from player 
i,h + l < i < h + h, with probability ^^7^- For player i < li, Xi{t, V-i) = 1 if t < a. By definition of the 
expected payment, the expected payment for player i < li is Vi + Xi{t, v-i)dt >Vi + Xi{t, v^i)dt = 
a. For player i, li < i < li + I2, when Vi increases to t < b, since (j){t) is the same for all t £ [a, b), 
the probability that the mechanism buys from player i does not change. That is, Xi{t,v^i) = ^^^j^ if 
t G [a, b). By definition of the expected payment, the expected payment for player i, li < i < li + I2, is 
ViXi{vi,v^i) + Xi{t,v-i)dt = Mlilzil). Therefore, EM[T.ie[n]Pi] ^ o^i + h^^^^^j^ > wa > wb/r > 
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A Optimal Procurement Mechanisms 

We first characterize the BIC randomized procurement mechanisms in a way similar to Myerson's charac- 
terization of truthful selling mechanisms ||T41 lITSl . We assume Xj(oo) = and E\pi{oo)] = 0. 

Lemma A.l (Lemma 14.41 ). A randomized procurement mechanism is BIC if and only if for every i the 
procurement probability x and payment p satisfies 

(i) Xi{vi) is decreasing in Vi; 

(ii) E\pi{vi)] = ViXi{vi) + Xi{t)dt. 
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Proof. (— )■) We need to show that for all v'^, E[pi{vi)] — ViXi{vi) > •)] — ViXi{v'j). By (ii), it is equal 

to show Xi{t)dt > Xi{t)dt + {vl-Vi)xi{v'^). lfv[ > Uj, then it equals Q Xi{t)dt > (t;^ - f •)> 
which is true due to the monotonicity of Xj. If v'^ < Vi, it equals {vi — v^)xi{v^) > J'J' Xi{t)dt, which is true 

i 

due to the monotonicity of Xj. 

(•(— ) Since the mechanism is BIC, for all Vi and v'-, E[pi{vi)] — ViXi{vi) > E[pi{v'-)] — ViXi{v[). Sym- 
metrically, we have E[pi{vi)\ — v^Xi{vi) < E[pi{v'^)] — v^Xi{v^). By subtracting the inequalities, we 
get {v'^ — Vi)xi{vi) > (f ■ — Vi)xi{v^), which implies (i). By rearranging these two inequalities, we get 
vl{xi{vi) - Xiiv'i)) > E[pi{vi)] - E[pi{v'-)] > Vi{xi{vi) - Xiiv'^)). Let v'^ = Vi + e, and divide all by e. 
When e — 0, both sides have the same value. Thus, we get v^^^^^j^ = '^'^^^^^"'^^ ■ Since Xi{oo) = implies 
E\pi{oo)] = 0, we have Pi{vi) = vx'-{vi)dv. Applying integration by parts, we can get (ii). □ 

When the cost of players are drawn from a publicly known distribution F, we characterize the optimal 
BIC mechanism to minimize the payment, when F is anti-regular. In [14], Myerson characterizes the opti- 
mal BIC mechanism to maximize the revenue for selling mechanisms assuming the distribution is regular. 
The proof of Lemma I A.2I follows the proof of Myersons's characterization flSI . 



Lemma A.2. f Lemma 14.51/ When the distribution F is anti-regular, the optimal BIC w-unit procurement 
buys from the w players with the smallest virtual cost. 

Proof. Let (f){z) = z + jj^- Suppose that for any BIC mechanism, the expected payment is equal to its 

expected virtual cost, that is Eyr^F[Yli^[n]Pi(.''^)] — -^t^-i^Eiefn] '/'('^«)^«(^)]- This implies that if the 
mechanism buys from w players the with the smallest virtual cost, then the mechanism minimizes the 
payment. Moreover, since F is anti-regular, = z + j||y is increasing in z. Since the mechanism 
buys from w players with smallest virtual cost, Xj(fj) is decreasing in Vi for all i. Hence, the mechanism is 

BIC. Thus, it suffices to show that -E^^FEieN ^*(^)] " ^»^~i^EieN '^(^i)^i(^)]- 

In order to show that the expected payment is equal to its expected virtual cost, it suffices to show that 
the expected payment of player i is E.ur^p\<\){vi)Xi{vi)\, since each Vi is drawn from F independently. 

Lemma A.3. The expected payment of player i is E^^p \(\)[vi)Xi(yi)\. 

Proof. Since the density function / is piecewise continuous, there exists a partition [ai, . . . , [a/j, hy^ of 
/'s domain, such that / is continuous within every interval [aj, Note that hi = Oj+i for all 1 < i < /i— 1. 



EAvAvi)\=Y. \ / E\pi{vi)\f(yi 

7 = 1 




bh \ 

Xi{z)f{vi)dzdvi (Lemma 



f{vi)dvidz 



(switch the order of integration) 




ViXi{vi)f{vi)dvi 
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+ 1 x,{z){F{z)-F{aj))dz+ I Xi{z){F{bj) - F{aj))dz 



Xiivi){vif{vi) + F{vi))dvi 
I Xi{vi)dvi + {F{hj) - F{aj)) I Xi{vi)dvi 



(•bh 

Xi{vi){vif{vi) + F{vi))dvi - F{aj) I Xi{vi)dvi + F{bj 





Xiivi){vif{vi) + F{vi))dvi 
bh _ 

Xi{vi)dvi - F{bj 

'J 

= ( / Xi{vi){vi + ^-j^)f{vi)dvi 
j=i Jy'^i) 

- F(aj) / x^ivi)dvi-Fibj) 
j=i \ -^^ 

= E^^ [(j){vi)xi{vi)] {F{ai) = 0,bi = a^+i for aU i < i < /i - 1) □ 

(End of proof of Lemma lA!2l ) □ 

Now, we consider the case that F satisfies Assumption 1 but (j){z) is not monotone in z. For selling 
mechanisms, Myerson [14] designs an ironing procedure to get the optimal BIC mechanism to maximize 
the revenue when F satisfies Assumption 1 . We show how to iron virtual values in the setting of procurement 
mechanism and use this to design an optimal BIC mechanism to minimize the payment. 

Suppose that the 0(z) is not monotone. We want to transform cl){z) to another function cl>{z), such that 
0(z) is increasing in z. Let q = F{v) and h{q) = Since the density function / is always 

positive, F is a strictly increasing. Thus, is increasing in z if and only if h{q) is increasing in q. 
Moreover, h{q) is increasing in q if and only if H{q) = h{t)dt is convex. However, H is not convex, 
since (l){z) is not monotone. Thus, we want to modify H to get a convex function G and define (^(z) based 
on G. 

Let S be the epigraph of H, that is 5" = {{q, y) \ y > H{q)}. Geometrically, if we draw y = H{q) on 
a plane, then S is the area containing H and above H. Let conv (5) denote the convex hull of set S. The 
convex hull of H{q) is G{q) = min{y | {q, y) G conv {S)} (Chapter 5 in jlTl ). Geometrically, if we draw 
y = G{q) on a plane, then G is the lower boundary of conv {S). By definition, a function is convex if its 
epigraph is a convex set. Since the epigraph of G, conv {S), is a convex set, G is convex. Since G is the 
lower boundary of conv {S), G{q) < H{q) for all q G [0, 1]. 

We define the ironed interval set and ([>{z) as follows. Let T be the set of points that H{q) and G{q) 
differ, that is, T = {q \ H{q) / G{q)}. Let S be the smallest set of intervals [yi,Zi), such that T = 
^i{yi, Zi). The ironed interval set is defined as { [F^^{yi),F~^{z,i)) \ [yi, zi) € S}. Since G is convex, G 
is differentiable on a dense subset of [0, 1] by Theorem 25.5 in lITTl . We define g{q) := ^{q), whenever 
^{q) is well-defined, and extend g to [0, 1] by right-continuity. The ironed virtual cost function is defined 
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?i^^z)=g{F{z)). 

Lemma A.4 (Lemma 14.71 ). The w-unit procurement mechanism that buys from the w players with smallest 
ironed virtual cost and breaks ties uniformly at random is the optimal BIC mechanism when the distribution 
satisfies Assumption 1. 

Proof. Since G is a convex function, g{q) is increasing in q. Thus, 4>{z) is increasing in z. Since (^(z) is 
increasing in z and the mechanism buys from the w players with smallest ironed virtual cost, the mechanism 
is BIC. We only need to show that the mechanism minimizes the payment. First, we want to relate the 
mechanism's payment to (t){z). Since the density function / is piecewise continuous, there exists a partition 
[ai, 61], . . . , [ah, hfi] of /'s domain, such that / is continuous within every interval [oj, hi]. Note that hi = 
Oj+i for all 1 < i < /i — 1. For any BIC mechanism, Xi{vi) is decreasing in Vi by Lemma l4!4l For a fixed 

Evr~.F\pi{vi)] = E^[(l){vi)xi{vi)] (Lemma|A31) 

= E^[(l){vi)xi{vi)] - E^[{(t){vi) - (t){vi))xi{vi)] 



E^,[(j){vi)xi{vi)] {4>{vi) 

1 = 1 -'"■i 



(j){vi))xi{vi)f{vi)dvi 



E^[4>{vi)xiivi)] - / iaiFivi)) - h{F{vi)))xi{vi)f{vi)dvi 



'i ( Vi )\vi=aj 



ES{v^)xi{vi)] - Y{G{F{vi)) - H{F{vi)))x,{i 
i=i 

h .bj 

+ 'Y {H{vi) - G{vi))dxi{vi) (integration by parts) 

i=i -^"^ 

h ,.b, 

+ / iH{F{vi)) - G{F{vi)))dxi{vi) 

j=l ■^"■i 



EyWvijXi 



The last equality holds since G(0) = i^(0) and G{\) = H{\) by the definition of G and hi = aj+i for 
all 1 < i < — 1. In the second term of the last line, the derivative of Xj is non-positive, since Xi{vi) is 
decreasing in Vi. Moreover, H{F{vi)) — G{F{vi)) is non-negative for all Vi, because G{q) < H{q) for all 
g E [0, 1]. In order to minimize the payment, we need to choose an allocation function Xi to minimize the 
magnitude of the second term. We show that the second term is zero when the mechanism buys from the w 
players with smallest ironed virtual cost and breaks ties uniformly at random. 

For any q G [0, 1], if H{q) — G{q) is zero, then the contribution to the second term is zero. Thus, we 
only need to consider where G and H differ. Since G is the convex hull of H, whenever G < H, G must 
be flat. That is, for any [a,h) G S, g{q) has the same value for all q G [a,h). Since cj){F^^{q)) = g{q), 
every Vi G [F~^(a), has the same ironed virtual cost. Since the mechanism breaks ties uniformly 

at random, Xi{vi) is constant for all Vi G [F^^ {a) , F^^ {h)) . Thus, the derivative of Xi{vi) is zero for all 
Vi G [F~^ [a) , F~^ {h)) . Since Xi{vi) is zero for all Vi G [F~^{a), F~^ (h)), it contributes nothing to the 
second term. Thus, the second term is always zero since if H{F{vi)) — G{F{vi)) is non-zero, then Xi{vi) 
is zero. Hence, the mechanism minimizes the payment. □ 



19 



